33 research outputs found
Normally-Off Computing Design Methodology Using Spintronics: From Devices to Architectures
Energy-harvesting-powered computing offers intriguing and vast opportunities to dramatically transform the landscape of Internet of Things (IoT) devices and wireless sensor networks by utilizing ambient sources of light, thermal, kinetic, and electromagnetic energy to achieve battery-free computing. In order to operate within the restricted energy capacity and intermittency profile of battery-free operation, it is proposed to innovate Elastic Intermittent Computation (EIC) as a new duty-cycle-variable computing approach leveraging the non-volatility inherent in post-CMOS switching devices. The foundations of EIC will be advanced from the ground up by extending Spin Hall Effect Magnetic Tunnel Junction (SHE-MTJ) device models to realize SHE-MTJ-based Majority Gate (MG) and Polymorphic Gate (PG) logic approaches and libraries, that leverage intrinsic-non-volatility to realize middleware-coherent, intermittent computation without checkpointing, micro-tasking, or software bloat and energy overheads vital to IoT. Device-level EIC research concentrates on encapsulating SHE-MTJ behavior with a compact model to leverage the non-volatility of the device for intrinsic provision of intermittent computation and lifetime energy reduction. Based on this model, the circuit-level EIC contributions will entail the design, simulation, and analysis of PG-based spintronic logic which is adaptable at the gate-level to support variable duty cycle execution that is robust to brief and extended supply outages or unscheduled dropouts, and development of spin-based research synthesis and optimization routines compatible with existing commercial toolchains. These tools will be employed to design a hybrid post-CMOS processing unit utilizing pipelining and power-gating through state-holding properties within the datapath itself, thus eliminating checkpointing and data transfer operations
Enabling Intelligent IoTs for Histopathology Image Analysis Using Convolutional Neural Networks
Medical imaging is an essential data source that has been leveraged worldwide in healthcare systems. In pathology, histopathology images are used for cancer diagnosis, whereas these images are very complex and their analyses by pathologists require large amounts of time and effort. On the other hand, although convolutional neural networks (CNNs) have produced near-human results in image processing tasks, their processing time is becoming longer and they need higher computational power. In this paper, we implement a quantized ResNet model on two histopathology image datasets to optimize the inference power consumption. We analyze classification accuracy, energy estimation, and hardware utilization metrics to evaluate our method. First, the original RGBcolored images are utilized for the training phase, and then compression methods such as channel reduction and sparsity are applied. Our results show an accuracy increase of 6% from RGB on 32-bit (baseline) to the optimized representation of sparsity on RGB with a lower bit-width, i.e., \u3c8:8\u3e. For energy estimation on the used CNN model, we found that the energy used in RGB color mode with 32-bit is considerably higher than the other lower bit-width and compressed color modes. Moreover, we show that lower bit-width implementations yield higher resource utilization and a lower memory bottleneck ratio. This work is suitable for inference on energy-limited devices, which are increasingly being used in the Internet of Things (IoT) systems that facilitate healthcare systems
Ultra-Fast, High-Performance 8x8 Approximate Multipliers by a New Multicolumn 3,3:2 Inexact Compressor and its Derivatives
Multiplier, as a key role in many different applications, is a
time-consuming, energy-intensive computation block. Approximate computing is a
practical design paradigm that attempts to improve hardware efficacy while
keeping computation quality satisfactory. A novel multicolumn 3,3:2 inexact
compressor is presented in this paper. It takes three partial products from two
adjacent columns each for rapid partial product reduction. The proposed inexact
compressor and its derivates enable us to design a high-speed approximate
multiplier. Then, another ultra-fast, high-efficient approximate multiplier is
achieved utilizing a systematic truncation strategy. The proposed multipliers
accumulate partial products in only two stages, one fewer stage than other
approximate multipliers in the literature. Implementation results by Synopsys
Design Compiler and 45 nm technology node demonstrates nearly 11.11% higher
speed for the second proposed design over the fastest existing approximate
multiplier. Furthermore, the new approximate multipliers are applied to the
image processing application of image sharpening, and their performance in this
application is highly satisfactory. It is shown in this paper that the error
pattern of an approximate multiplier, in addition to the mean error distance
and error rate, has a direct effect on the outcomes of the image processing
application.Comment: 21 Pages, 18 Figures, 6 Table
Nv-Clustering: Normally-Off Computing Using Non-Volatile Datapaths
With technology downscaling, static power dissipation presents a crucial challenge to multicore, many-core, and System-on-Chip (SoC) architectures due to the increased role of leakage currents in overall energy consumption and the need to support power-gating schemes. Herein, a non-Volatile (NV) flip-flop design approach, referred to as NV Clustering, is developed to realize middleware-transparent intermittent computing. First, a Logic-Embedded Flip-Flop (LE-FF) is developed to realize rudimentary Boolean logic functions along with an inherent state-holding capability within a compact footprint. Second, the NV-Clustering synthesis procedure and corresponding tool module are utilized to instantiate the LE-FF library cells within conventional Register Transfer Language (RTL) specifications. This selectively clusters together logic and NV state-holding functionality, based on energy and area minimization criteria. NV-Clustering is applied to a wide range of benchmarks including ISCAS-89, MCNS, and ITC-99 computational circuits using a LE-FF based on the Spin Hall Effect (SHE)-assisted Spin Transfer Torque (STT) Magnetic Tunnel Junction (MTJ). Simulation results validate functionality and power dissipation, area, and delay benefits. For instance, results for ISCAS-89 benchmarks indicate 15 percent area reduction on average, up to 22 percent reduction in energy consumption, and up to 14 percent reduction in delay as compared to alternative NV-FF based designs, as evaluated via SPICE simulation at the 45-nm technology node
Energy-Efficient And Process-Variation-Resilient Write Circuit Schemes For Spin Hall Effect Mram Device
In this paper, various energy-efficient write schemes are proposed for switching operation of spin hall effect (SHE)-based magnetic tunnel junctions (MTJs). A transmission gate (TG)-based write scheme is proposed, which provides a symmetric and energy-efficient switching behavior. We have modeled an SHE-MTJ using precise physics equations, and then leveraged the model in SPICE circuit simulator to verify the functionality of our designs. Simulation results show the TG-based write scheme advantages in terms of device count and switching energy. In particular, it can operate at 12% higher clock frequency while realizing at least 13% reduction in energy consumption compared to the most energy-efficient write circuits. We have analyzed the performance of the implemented write circuits in presence of process variation (PV) in the transistors\u27 threshold voltage and SHE-MTJ dimensions. Results show that the proposed TG-based design is the second most PV-resilient write circuit scheme for SHE-MTJs among the implemented designs. Finally, we have proposed the 1TG-1T-1R SHE-based magnetic random access memory (MRAM) bit cell based on the TG-based write circuit. Comparisons with several of the most energy-efficient and variation-resilient SHE-MRAM cells indicate that 1TG-1T-1R delivers reduced energy consumption with 43.9% and 10.7% energy-delay product improvement, while incurring low area overhead
Logic-Encrypted Synthesis For Energy-Harvesting-Powered Spintronic-Embedded Datapath Design
The objectives of advancing secure, intermittency-tolerant, and energy-aware logic datapaths are addressed herein by developing a spin-based design methodology and its corresponding synthesis steps. The approach selectively-inserts Non-Volatile (NV) Polymorphic Gates (PGs) to realize datapaths which are suitable for intrinsic operation in Energy-Harvesting-Powered (EHP) devices. Spin Hall Effect (SHE)-based Magnetic Tunnel (MTJs) are utilized to design NV-PGs, which are combined within a Flip-Flop (FF) circuit to develop a PG-FF realizing Boolean logic functions with inherent state-holding capability. The reconfigurability of PGs is leveraged for logic-encryption to enhance the security of the developed intermittency-resilient circuits, which are applied to ISCAS-89, MCNS, and ITC-99 benchmarks. The results obtained indicate that the PG-FF based design can achieve up to 7.1% and 13.6% improvements in terms of area and Power Delay Product (PDP), respectively, compared to NV-FF based methodologies that replace the CMOS-based FFs with NV-FFs. Further PDP improvements are achieved by using low-energy barrier SHE-MTJ devices within the PG-FF circuit. SHE-MTJs with 30kT energy exhibit 40.5% reduction in PDP at the cost of lower retention times in the range of minutes, which is still sufficient to achieve forward progress in EHP devices having more than hundreds of power-on and power-off cycles per minute
Design And Evaluation Of An Ultra-Area-Efficient Fault-Tolerant Qca Full Adder
Quantum-dot cellular automata (QCA) has been studied extensively as a promising switching technology at nanoscale level. Despite several potential advantages of QCA-based designs over conventional CMOS logic, some deposition defects are probable to occur in QCA-based systems which have necessitated fault-tolerant structures. Whereas binary adders are among the most frequently-used components in digital systems, this work targets designing a highly-optimized robust full adder in a QCA framework. Results demonstrate the superiority of the proposed full adder in terms of latency, complexity and area with respect to previous full adder designs. Further, the functionality and the defect tolerance of the proposed full adder in the presence of QCA deposition faults are studied. The functionality and correctness of our design is confirmed using high-level synthesis, which is followed by delineating its normal and faulty behavior using a Probabilistic Transfer Matrix (PTM) method. The related waveforms which verify the robustness of the proposed designs are discussed via generation using the QCADesigner simulation tool